Mixture-model cluster analysis using information theoretical criteria

نویسندگان

  • Jaime R. S. Fonseca
  • Margarida G. M. S. Cardoso
چکیده

The estimation of mixture models has been proposed for quite some time as an approach for cluster analysis. Several variants of the Expectation-Maximization algorithm are currently available for this purpose. Estimation of mixture models simultaneously allows the determination of the number of clusters and yields distributional parameters for clustering base variables. There are several information criteria that help to support the selection of a particular model or clustering structure. However, a question remains concerning the selection of specific criteria that may be more suitable for particular applications. In the present work we analyze the relationship between the performance of information criteria and the type of measurement of clustering variables. In order to study this relationship we perform the analysis of forty-two data sets with known clustering structure and with clustering variables that are categorical, continuous and mixed type. We then compare eleven information-based criteria in their ability to recover the data sets’ clustering structures. As a result, we select AIC3, BIC and ICL-BIC criteria as the best candidates for model selection that refers to models with categorical, continuous and mixed type clustering variables, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions

Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropri...

متن کامل

On the Performance of Information Criteria in Latent Segment Models

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which ma...

متن کامل

Fuzzy Cluster Validation Using the Partition Negentropy Criterion

We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partiti...

متن کامل

Segmentation of Brain MR Images based on Finite Skew Gaussian Mixture Model with Fuzzy C-Means Clustering and EM Algorithm

Segmentation is a process of converting inhomogeneous data into homogeneous data. There are many segmentation techniques available inthe literature. Among these techniques, finite Gaussian Mixture Model using EM algorithm is one mostly used. However, Gaussian Mixture Model is suited well when the image under consideration is symmetric. But in reality, medical images are asymmetric. Hence, it is...

متن کامل

A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data

MOTIVATION Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. RESULTS We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2007